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Abstract 



In this paper we investigate an extremal problem on binary phylogenetic trees. 
Given two such trees T\ and T2, both with leaf-set {1, 2, . . . , n}, we are interested in 
the size of the largest subset S C {1,2, ... ,n} of leaves in a common subtree of T\ 
and T2. We show that any two binary phylogenetic trees have a common subtree 
on f)(y / Iogn) leaves, thus improving on the previously known bound of r2(loglogn) 
due to M. Steel and L. Szekely. To achieve this improved bound, we first consider 
two special cases of the problem: when one of the trees is balanced or a caterpillar, 
we show that the largest common subtree has fi(logn) leaves. We then handle the 
general case by proving and applying a Ramsey- type result: that every binary tree 
contains either a large balanced subtree or a large caterpillar. We also show that 
there are constants c, a > such that, when both trees are balanced, they have a 
common subtree on cn a leaves. We conjecture that it is possible to take a = 1/2 in 
the unrooted case, and both c = 1 and a = 1/2 in the rooted case. 

1 Preliminaries 

All trees considered in this paper are binary. Although we mainly talk about rooted trees, 
we introduce the problem in terms of unrooted trees to be consistent with earlier papers 
on the subject. 
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1.1 Unrooted phylogenetic trees 

A phylogenetic tree is a binary, unrooted tree in which the leaves are labelled bijectively 
with elements from a finite set. All internal vertices of a phylogenetic tree have degree 3. 
For such a tree T, the set of vertices is denoted by V(T), the set of edges by E(T), and 
the set of leaves by L(T). 

In phylogenetics, it is common to consider isomorphism between trees in a more re- 
stricted sense. We say that trees T\ and T 2 are isomorphic (and write T\ = T 2 ) if there is 
a bijection ip : V(Ti) — > V(T 2 ) such that 

(i) {p(u)^(v)}EE(T 2 ) ^ ^(rO, 

(ii) (p(i) = i for all leaves i G T(Ti). 

Observe that, while this notion of isomorphism also works for non-binary trees, there can 
be no isomorphism between trees that have distinct leaf-sets. 

For a subset X C L(T), define the restriction of T to X to be the phylogenetic tree 
T\X with leaf-set L(T\X) = X, and the property that there exists an isomorphism (in 
the sense of the previous paragraph) from a subdivision of T\X to the unique minimal 
connected subgraph of T containing X. We loosely call the tree T\X a subtree of T. 
Given trees Ti and T 2 , if X is a subset of L(T\) n T(T 2 ) of maximum cardinality with the 
property that Ti|X = T 2 |X, we say that T\\X (or T 2 |X) is a maximum agreement subtree 
of Ti and T 2 . We also define the parameter mast{T 1; T 2 } as 

mast{7i,T 2 } := max{|X|: X Q L{T X ) Ci L{T 2 ) ,T X \X T 2 \X} . 

Now let 

mast(ra) := min {mast{7i, T 2 } : L(Ti) = T(T 2 ) = {1, 2, . . . , n}}. 
It was shown by Kubicka, Kubicki, and McMorris in [1] thatQ 



ci (log logn) 1 / 2 < mast(n) < c 2 logn, (1) 

for some positive constants c\ and c 2 . To see that the upper bound in ([TJ is tight, consider 
the case when Ti is a caterpillar with n leaves and T 2 is a balanced tree of height logn 
(to be defined in Section [2|: any common subtree must be a caterpillar, and there is no 
caterpillar of length more than 2 logn in T 2 . The lower bound in (JT1) was improved by 
Steel and Szekely [2] who showed that mast(n) > clog logn for a positive constant c. In 
fact, in a remark following Theorem 1 in their paper, they mention that a more explicit 
bound of | loglog(n — 1) may be derived, and suggest that a much stronger lower bound 
of clogn might hold, for some positive constant c. 

Problem 1. Is there a constant c > such that any two phylogenetic trees T\ and T 2 
with L(T\) = L(T 2 ) = {1,2, ... ,n} have a maximum agreement subtree on at least clogn 
leaves? 

One of the goals of this paper is to further improve the lower bound in 0. In 



particular, in Theorem 14, we show that any two phylogenetic trees Ti and T 2 with leaf- 



set {1, 2, . . . , n} have an agreement subtree on Q ( A/log n) leaves, 
logarithms in this paper are always taken with base 2. 
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1.2 Rooted phylogenetic trees 

Next, we develop some terminology for rooted phylogenetic trees. In a rooted phylogenetic 
tree T, for |L(T)| > 1, all internal nodes have degree 3 except the root, which has degree 2. 
If |L(T)| = 1, then T has exactly one vertex which is both its only leaf and root. Let 
us denote the root of a tree T by p(T). For a vertex u £ V(T), we denote by the 
subtree of T rooted at u containing all descendants of u in T. We denote by £(u) and 
r(u), respectively, the left and the right children of an internal vertex u of a rooted tree. 

For a rooted tree T and vertices x and y in V(T), we define x Ay to be the most recent 
common ancestor of x and y. Since A is associative and commutative, we may define the 
most recent common ancestor of a set X = x 2 , . . . ,Xk} Q V(T) to be 

A X := Xi A x 2 A ■ ■ ■ A x k . 

Given rooted trees S and T, we say that S is a subtree of T (and write S -< T) if there 
exists an injective map / : V(5') — )■ V(T) satisfying 

1. f(x) = x for all x £ L(S') 

2. /(a Ay) = f(x) A f(y) for all x, y £ V(S) 

We say that rooted phylogenetic trees T\ and T 2 are isomorphic if Ti ^ T2 and T2 ^ T\. 

Now we define the restriction of T to the set of leaves X as the unique binary, rooted 
phylogenetic tree T\X having leaf-set X and satisfying T\X ^ T. The rooted maximum 
agreement subtree and mast{-} are defined as in the unrooted case. 

Proposition[2]allows us to recursively construct agreement subtrees (but not necessarily 
maximum agreement subtrees) of rooted trees. 

Proposition 2. LetT\ andT 2 be rooted, binary trees, with roots u := p(T%) andv := p{T 2 ), 
respectively. If St ^ rf^'^Tj^''" and S r ^ yW")^ T^"^, then the tree S with left and 
right subtrees St and S T , respectively, is such that S -< Ti,T 2 . 

Proof. Let ft and f r be maps that realize St ^ Tf'"" and S r ^ r p^ u ^ ; respectively. Then 
S ^ Ti is realized by defining a map / : ^(S 1 ) — > V(Tx) so that 

(ft(x) tixeV(S e ) 
f(x):= lf r (x) iixeV(S r ) 
[p(T0 ifx = p(S). 

A similarly constructed map shows that S ^<T 2 . □ 

We denote the tree S constructed from St and S r as in the above lemma by St ° 5V. 

To prove the results in this paper, we first obtain agreement subtrees of rooted trees 
(constructed by rooting the given unrooted trees suitably), and then agreement subtrees 
of unrooted trees by ignoring the roots. 
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2 When one of the trees is balanced 



A rooted, phylogenetic tree is balanced if all leaves are at the same distance from the 
root. For unrooted trees, the definition is analogous. We say that a phylogenetic tree is 
balanced if all leaves are at the same distance from the center of the tree. In this section, 
we solve Problem [T] when one of the trees is balanced and binary. 

2.1 The rooted case 

We first consider the case of rooted trees, thus assuming that one tree is a rooted, balanced 
tree of height m and another is a general rooted, binary tree. In Theorem [8] we prove a 
result for the unrooted case by appropriately rooting the trees and applying Lemma |3j 

In what follows, for nodes x G V(T\) and y G V^(T 2 ), let t xy be the number of elements 
in the set L{T[ x) ) n L(T 2 {y) ) . 

Lemma 3. Suppose T\ is a rooted, balanced, binary tree on a leaf-set of cardinality 2 m , 
and T 2 is an arbitrary rooted, binary tree on t > leaves with L(T 2 ) C L(Ti). Then for 
all 5 G (0, 1/2) ; the two trees have a maximum agreement subtree that has at least 

mlog(l — 5) + logt 
1 - log 5 

leaves. 

Proof. Let g(m,t) be the minimum value of mast{Ti,T2} (over all choices of T\ and 
T 2 ), where T\ and T 2 are as in the statement of the lemma. Observe that g(m,t) is a 
monotonically non-decreasing function of t. We show by induction on m that 

mlog(l - 8) +\ogt 
9(m,t)> . (2) 

Base case: If m = 0, then g(m,t) = 1 and the right-hand-side of ^ is 0. So we may 
assume that m > 0. 

Induction step: Let u := p(T\) and v := p(T 2 ). Observe that 

t t uv ^£(tz)^(t?) r(u)r(v) + te(u)r(v) + t r (3) 

Without loss of generality, assume that te( u )e(v)+t r (u)r(v) > te( u ) r (v)+t r (u)e(v), and t r ( u ) r ( v ) > 
te(u)e(v)- Therefore, by (j3]), we have 

tr(u)r(v) > r*/ 4 l • ( 4 ) 

Case 1: t £ ^ v ) > 0. 
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In this case, we take a maximum agreement subtree Se of T^ u ^ and T 2 ^, and a 
maximum agreement subtree 5 r of T^" and T 2 ^. We then construct S = S# o S r , 
which by Proposition [2] is an agreement subtree of T± and T 2 . Now 

g(m,t) > \L(S)\ = \L(S e )\ + \L(S r )\ > 1 + g(m - 1, t r(u)r{v) ). 

By Q, we have 

g{m,t) >l + g(m-l, \t/4\). 
Therefore, applying the induction hypothesis, we obtain 

(to - 1) log (1 - 8) + log (t/4) 



g(m,t) > 1 + 



1 — log 5 



> 



mlog (1 — 5) + logt — 1 — log (1 — 5) — log 5 
1 — log 5 1 — log 5 

mlog (1 — 5) + logt 
1 - log 5 ' 



Case 2: t^ u )^ v ) = 0. 

From now on, we assume that t r ( u )i( v ) is non-zero; otherwise, together with the as- 
sumption of this case, it would contradict the hypothesis that L(T 2 ) C L(Ti). 
Subcase 2.1: te( u )r(v) + t r (v)i(v) > St an d ti(u)r(v) > 0. Also, without loss of generality, 
assume that t r («)^(«) > ^(«)r(«), so that t r ( u )^) > 8t/2. 

In this case, we construct S := Se o S r , where SV is a rooted maximum agreement 
subtree of and T 2 r ^^, and S r is a rooted maximum agreement subtree of 



and T 2 . Therefore, 



g(m,t) > \L(S e )\ + \L{S r )\ 

> l + g(m-l,\6t/2]) 

(m -1) log (!-<*)+ log (<Jt/2) 



> 1 + 



1 - log 5 



m log (1 — 5) + log i — log (1 — 5) 

1 — log S 1 — log 5 

mlog (1 — 5) + logt 



> 



log 5 



Subcase 2.2: t^ u y^ + t r ( u )^) < *£• In this case, t r ( u ) r („) = t — ^( u )£(„) — ^( u )r(«) — U(u)£(v) > 

Let S be a rooted maximum agreement subtree of T^ u ^ and T 2 . We have 

<?(m,t) > |L(5)| 

= «/(m-l,r(l -<*)*!) 

(m - 1) log (1 - 5) + log (1 - 5)t 



> 



1 - log* 
mlog (1 — 5) + logt 
1 - log* 
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Subcase 2.3: ti( u ) r (v) — 0. 

Let S be a rooted maximum agreement subtree of T} U " and T 2 ■ Therefore, 

g(m,t) > \L(S)\ 

= g(m-i,t) 



> 



> 



(m - l)log(l - 6) + \ogt 
l-log<5 

m log (1 — 5) + log t — log (1 — 5) 
1 — log 5 1 — log 5 

mlog (1 — 5) + logt 
1 - log 5 ' 



With this we complete all subcases of the induction step. Therefore ^ holds and the 
lemma is proved. □ 

An immediate consequence of the above lemma is the following corollary. 

Corollary 4. For 5 G (0, 1/2), set 

. l + log(l-J) 

a - = — i i — x — • 5 

1 — log 

IfTi is a rooted, balanced, binary tree on a leaf-set of cardinality 2 m , andT 2 is an arbitrary 
rooted, binary tree on 2 m leaves such that L(T2) = L(Ti), then T\ andT 2 have a maximum 
agreement subtree on at least am leaves. 

Remark 5. We found numerically that the maximum value of a is approximately 0.2055 
obtained when S is approximately 0.1705. 

Remark 6. An algorithm to construct an agreement tree (though not necessarily a maxi- 
mum agreement subtree) is implicit in the proof of Lemma [3] Observe that in each of Case 
1 and Subcase 2.1, we may take the agreement subtree Se to be a tree with a single leaf 
(e.g. any leaf from L(Tl e{u > ] )nL(T 2 Wv)) ) in Case 1 and any leaf from L(lf (u)) ) nL(T 2 (r{v)) ) 
in Subcase 2.1). Such a choice gives us an agreement subtree that is a caterpillar of length 
am in Corollary |4} 

We explicitly describe a recursive algorithm, which we call Matchl. The algorithm 
takes as input two rooted, binary trees, the first one being balanced, and returns a set of 
leaves in a common subtree that is a caterpillar. As in Lemma [3j algorithm Matchl also 
depends on a parameter 5 6 (0, 1/2). 
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Algorithm Matchl (7?', T 2 {) ) 

1: if \L(T[ u) )\ = l or \L(T[ v) )\ = I then return L(T[ u) ) f\ L(T[ v) ) . 

2: if necessary, interchange left and right subtrees in T[ u) and/or T 2 [v) so that 

tf( u )r(v) +t r (u)£(v) < U(u)t(v) +t r (u)r(v) and U( u )e( v ) < £ r (u)r(u)- 

3: if {te( u )e(v) > 0) then 

(a) select any leaf 2 from L(t[ 1(u)) ) n L(t[ 1{v)) ) , 

(b) return {^} U Matchl(Tf {u)) , T 2 (r(t,)) ) . 

4: if t r ( u )f(„) = then return Match 1(1^, T 2 {r{v)) ) . 
5: if t e{u)r{v) = then return Matchl(T 1 (r(u)) , T 2 {v) ) . 
6: if (t|( u ) r (<u) + tr(«)^(o) > ^it-u) then 

(a) if necessary, interchange the left and the right subtrees of both 
and T 2 so that t f ^ r ^ < t r ^ u y^ , 

(b) select any leaf z from L(T[ l{u)) ) n L(T[ r{v)) ) , 

(c) return {2} U Matchl(Tf {u)) , T<f (l,)) ) . 

7: return Match^T^, T^ v)) ) . 

Steel and Warnow [3] devised an efficient polynomial time algorithm for finding the 
maximum agreement subtree of two given binary trees. Our algorithm is not optimal, 
but it is easy to analyze; moreover, it is easy to guarantee a lower bound on the returned 
value. We show how the value of a in Lemma [3] may also be obtained by analyzing the 
algorithm. 

Alternative proof of Corollary^ Set x := p(T{) and y := piT^). We now analyze the 
execution of the a call to Matchl(T 1 (x) , T 2 (y) ). 

First observe that, each time an instance of algorithm Matchl is being executed, it 
calls itself recursively only once in that instance. That happens, say k times, each time 
going deeper in the recursion levels, until a base case is reached in line [TJ Thus, we 
may define two sequences of nodes x = x , x±, x 2 , ■ ■ ■ , x^ and y = y , yx, y 2 , ■ ■ ■ ,Vk that 
correspond to the roots of the trees passed as arguments in each triggered call. That 
is, Matchl(Tf°\T 2 (yo) ) calls Matchl(Tf l} , T 2 (yi) ), which in turn calls Matchl(T-f a * ) ,T 2 (w) ), 
and so on. Note that the nodes in each sequence need not be distinct. For example, if 
Matchl is called from line|4j then Xi + i = xf, and similarly, if Matchl is called from line[5j 
then y i+1 = y { . 

As a shorthand notation, define 
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Now suppose Matchl(T 1 (ll) ,T 2 (,;) ) is being called with u = Xi and v = yi. Using our 
notation, we have ti = t uv . For each possibility of calling Matchl recursively, we obtain a 
lower bound for £ i+1 in terms of fcj. 



After executing line[2j we have t r ( u )r(v) — t uv /A. Hence, if the recursive call in line 3b 
is triggered, we have 

U+i > U/A. (6) 

If the recursive call in line [4] is triggered, it is because t^ u y^ = and t r ( u )e(y) — 0, which 
implies t ur r v ) = t uv . Hence, in that case, we must have 

ti+i = ti. (7) 



Similarly, if the recursive call in line [5] is triggered, we have t r ( u ) v = t uv , which also 
implies (J7|. The conditions in lines [6] and 6a imply that £ r ( u )£(«) > St uv /2. Hence, if the 



recursive call in line |6cl is made, we must have 



U+i > SU/2. (8) 

In line [7} since t^ u )r(v) + t r {u)i{v) < $t U v and t liu)i{v) = 0, we have t r{u)T{v) > (1 - 5)t uv . 
Hence, if Matchl is called from line[7| then we have 

t i+1 > (1 - 5)U. (9) 



Now suppose that, during the entire execution of the recursive algorithm, line [3b] is 
executed a times, line [4] is executed b times, line [5] is executed c times, line 6c is executed 
d times, and line [7] is executed e times. Since a new leaf z is returned each time one of 



the lines 3b or 6c is executed, the set returned by the outermost call Matchl(T , 1 (x ' ) , T 2 ) 
(when the execution halts) contains precisely a + d + 1 leaves. Therefore, it is enough to 
show the following. 

Claim 7. a + d > am. 

To prove the claim consider the sequence to, £i, • • • , t k . For each i in {0, 1, . . . , k — 1}, 
we know that satisfies one of or @. Therefore, we have 

t k >t (\X ( 5 -) (i-sy. (io) 



A) \2 / 

Now observe that to = 2 m . Also, since the k-th recursive call is a base case, we have 
tk = 1. Moreover, because 5 < 1/2, we have l/4 a > (5/2) a . Using these observations and 
the fact that e < m, equation (10) yields 

/ r\ a+d 

1 > 2 m ( - 1 (1 - 6) m . 



Solving for a + d, we obtain 

, fl + \og(l-6)\ 

a + a > — m = am. 

V l-log5 J 

The claim is proved, and the corollary follows. □ 
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2.2 The unrooted case 

We now consider the case when one of the trees is unrooted and balanced. We define two 
classes of unrooted, balanced trees. The center of a tree may be either a single vertex or 
a pair of adjacent vertices. Let m > be an integer. When all leaves of a tree are at 
distance m from the center and the center is a single vertex, we say that the tree is in 
class C m . When all leaves are at distance m from the center and the center is a pair of 
adjacent vertices, we say the phylogenetic tree is in class B m+ i. By construction, trees in 
the class C m have 3 x 2 m_1 leaves, and trees in the class B m have 2 m leaves. 

Theorem 8. If T\ is a balanced phylogenetic tree on n leaves, and T 2 is an arbitrary 
phylogenetic tree on the same leaf-set, then they have an agreement subtree on at least 
a log leaves, where a is the constant defined in ^ . 

Proof. If T\ is in class B m , for some m, then n = 2 m . Let {x, y} be its central edge. 
We add a new vertex z, and replace the edge {x, y} by edges {x, z} and {y, z}, and root 
the tree at z. For T2, we add a new vertex w, replace an arbitrary edge {u,v} by edges 
{u,w} and {v,w}, and root T 2 at w. Notice that, for any rooted agreement subtree of 
T[ z) and T 2 {w \ we may ignore the root and obtain an unrooted agreement subtree of T\ 
and T 2 . Applying Lemma [3] to t[ z ^ and T 2 W ^ gives a lower bound of am on the size of the 
maximum agreement subtree of T x and T 2 . The desired bound follows. 

If Ti is in class C m , for some m, then n = 3 x 2 m ~ 1 . Let z be the center of T\. Let X 
be the set of leaves in two of the three branches rooted at z. Note that T\\X is in class 
B m and T 2 \X is an arbitrary phylogenetic tree. Proceeding as in the above paragraph, we 
obtain a lower bound of am on the size of the maximum agreement subtree of Ti\X and 
T 2 \X. Hence, T\ and T 2 have an agreement subtree on at least a log leaves. □ 

The following proposition for the case when one of the trees is "almost balanced" is 
proved with little extra effort. 

Proposition 9. For every k > 0, there is a constant at > such that, if T\ and T 2 are 
binary trees on the same leaf-set of cardinality n, and T\ has radius at most klogn — I, 
then they have a maximum agreement subtree on at least a^logn leaves. 

Proof. In tree T 1; we subdivide the central edge (if it has a central edge) or an edge 
adjacent to the center (if its center is a single vertex), and root the tree at the newly 
inserted vertex of degree 2. (We have bounded the radius of T\ by klogn — 1 and not 
klogn only to allow the possibility that when we root 7\, its radius may increase by 
1.) We root T 2 by subdividing an arbitrarily chosen edge. We then construct a rooted, 
balanced, binary tree T[ of height k log n that contains 7\ as a subtree (in the sense that 
T\ ^ T[). Now by Lemma |3j we assert that T[ and T 2 (hence also 7\ and T 2 ) have an 
agreement subtree on at least 
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leaves. We select 5 sufficiently small to satisfy 1 + fclog(l — 5) > 0, and set 



otk '■= 



l + Hog(l-<5) 
1 — log 5 



Then there is an agreement subtree on at least log n leaves. Indeed, the above value of 
«fc may also be obtained by (re)analyzing algorithm Matchl as in the alternative proof of 



3 General binary trees 

Our approach to general binary trees is based on the following intuition: every binary 
tree has large diameter or contains (as a restriction) a balanced subtree of large height. 

For < k < h, let f(h, k) be the maximum number of leaves a rooted tree of height 
at most h can have so that no restriction of the tree is a balanced, binary tree of height 
more than k. 

Lemma 10. If h = k or k = 0, then f(h,k) = 2 k . If < k < h, then 



This is proved as follows. If h — k or k — 0, then we have f(h, k) = 2 k , the extremal tree 
being the rooted, balanced tree of height k. 

Now suppose that h > k > 0. We first prove that f(h,k) < f(h — l,k) + f(h — l,k — l). 
Let T be a binary tree of height at most h with more than f(h — 1, k) + f(h — 1, k — 1) 
leaves. Suppose T has x leaves in the left subtree and y leaves in the right subtree. 
Without loss of generality assume y < x. If x > f(h — l,k), then the left subtree of T 
would have a restriction to a balanced, binary tree of height k + 1. Therefore, we may 
assume that x < f(h — 1, k), which implies f(h — 1, k — 1) < y < x. It follows that both 
the left and the right subtrees have restrictions to balanced trees of height k, and that T 
has a restriction to a balanced tree of height k + 1, which is a contradiction. 

Next we show that f(h, k) > f{h — 1, k) + f(h — 1, k — 1). Consider the tree T(h, k) 
defined as follows: if h = k or k = 0, then T(h, k) is a balanced, binary tree of height k\ 
otherwise, its left subtree is an extremal tree for parameters h — 1 and k, and its right 
subtree is an extremal tree for parameters h — 1 and k — 1. Thus T(h, k) has precisely 
f(h — 1, k) + f(h — 1, k — 1) leaves, and does not contain a restriction that is a balanced 
tree of height more than k. 



Corollary |4| 



□ 




Proof. We claim the following recurrence for f(h, k): 




if h = k or k = 0, 
if < k < h. 



(11) 
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Thus we have f(h, k) = f(h — 1, k) + f(h — l,k — l) for < k < h, and the tree T(h, k) 
constructed above is an example of an extremal tree for parameters h and k. In fact the 
above arguments, together with induction onh + h, show that T(h, k) is the unique such 
tree. We skip the details. 

Now the solution to the recurrence relation is obtained by expanding it until all terms 
are expressed as f(i, i) = 2 % for some i > or f(j, 0) = 1 for some j > 0. 

(*,*) (h,k) 



(1,1) 




(1,0) 



(h-k,0) 



Figure 1: An illustration for the recurrence in (11) 



In Figure [T] above, each directed path from the point (h, k) to (i, i) contributes the 
term f(i,i) = 2 l , and there are ( I^T 1 ) such paths; similarly every path from (h,k) to 
(j, 0) contributes f(j,0) = 1, and there are such paths. Hence, for < k < h, we 

have 



i=l 



3=0 



Since the second sum is C^ 1 ), the desired bound follows. 
Corollary 11. For 1 < k < h, we have f(h, k) < (2h) k . 



□ 



Proof. We apply Lemma 10 If k — 1, then f(h, k) — h + 1 < (2h) k . If 1 < k < h, then 
we have 

/<**>=E( k :i: 1 ><(J) £*<<»)*■ 



i=0 v 
k ^ i^)U\k 



i=0 



Uk = h, then f(h, k) = 2 k < (2h) 



□ 



Define d>(n,a) = - — - — — and ib(n,b) = - — f — — . 

2 log log n 

Corollary 12. Given any a,b e (0, 1) swc/i that a + b = 1, every tree with n > 2 leaves 
contains either a path of length at least (logn)^ 1 -"'^ or a balanced subtree of height at least 
Sin, a). 
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Proof. Let a, b G (0, 1) such that a + b 
have 



1. Let k < (j){n,a) and h < (logn)^ n ' b \ We 



k log(2/i) = fc + log h 



< (pin, a) + <f>(n, a)i/j(n, b) log log n 
(logn) a logn 

2 + 2 

< loerra. 



(12) 



Hence, by Corollary 11, we conclude that /(/t, k) < n. Now the corollary follows. 



□ 



Proposition 13. IfT\ and T 2 are binary trees on the same leaf-set of cardinality n, and 
Ti is a caterpillar, then they have a maximum agreement subtree on at least | logn leaves. 

Proof. The proof of this fact goes along the lines of the proof of Theorem 1 in Steel and 
Szekely [2J. We sketch it here. We embed Ti and T 2 in the plane so that the leaves of 
T\ are on one side of the longest path in T\. Without loss of generality, suppose that the 
leaves of Ti appear in the order 1,2, ... ,n. The embedding of T 2 imposes a circular order 
on its leaves. We cut this circular order arbitrarily to get a linear order ix, i 2 , . . . , i n . Next 
we find the longest monotone subsequence of i±, i 2 , . . . , i n ] it has length at least y/n by the 
Erdos-Szekeres Theorem Let X be the set of leaves in this subsequence. We restrict T\ 
and T 2 to X obtaining T\\X and T 2 \X. Notice that T\\X is still a caterpillar. We further 
restrict both trees to Y C X so that T 2 \Y is a caterpillar with a maximum number of 
leaves. Thus \Y\ is at least logn (the extremal case being when T 2 \X is balanced). Now 
both Ti\Y and T 2 \Y are caterpillars (see Figure [2] below). Now let y\, y 2 , . . . , y% be the 
elements of Y in the order they appear in the embedding of Ti. 



T X \Y 



Hi 



"1 I 

2/2 2/ 3 ••• 



T" 

2/fc-i 



Vk 



To\Y 



cut 



Vi-i ' ' ' 2/ 2 2/i 

I I T 



tt n 

2/i+i ' ' ' 



V k 2/fc-i ' ' ' 2/ J+ i 

U L 



Figure 2: The embeddings of T^jY and T 2 \Y . 



In Figure [2j we can see that there are three maximal agreement caterpillars, namely, 
caterpillars with leaf-sets {y x , y 2 , . . . , yi}, {yi,y i+1 , ...,%} and {y j} y j+1 , ...,y k }- One of 
them must have length at least (k + 2)/3 > | logn. □ 
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Theorem 14. If T\ and T 2 are binary trees on the same leaf-set of cardinality n > 2, 
then they have a maximum agreement subtree having at least ^y / \ogn + a log \ leaves. 



Proof. Applying Corollary 12 (with a = b = 1/2), one of the trees must contain a balanced 
subtree of height at least 0(n, 1/2) or a path of length at least (logn)^*™' 1 / 2 ). Suppose 
that one of the trees contains a balanced subtree of height at least <f)(n, 1/2). Let A be the 



leaf-set of such a balanced subtree. Therefore, after restricting the other tree to A, we can 
claim by Theorem [8] that T\ and T 2 have a common subtree on at least acp(n, 1/2) + a log | 
leaves from A. 

If such a balanced subtree does not exist in either of the two trees, then there is a path 
(and hence a caterpillar) of length at least (log of the trees. We restrict 



both trees to the set of leaves in this caterpillar. Therefore, by Proposition 13, there 
must be a common subtree on at least ^(n, 1/2) loglogn = | -y/log n leaves. Taking the 
maximum value of a as in Remark |5j this is a quantity larger than the desired bound. □ 

Corollary 15. Any given trees T\, T 2 , ■ ■ ■ , Tj~, where k > 2, on the same leaf set of 
cardinality n have a maximum agreement subtree on fi(log fe_1 n) leaves. 



Proof. By Theorem 14, there is a common subtree T 12 of Ti and T 2 on fi(i/logn) leaves. 



In fact Ti 5 2 is a caterpillar. This is so in either of the two cases considered in Theorem 14 
If one of the trees contains a sufficiently large balanced subtree, then by Algorithm 
Matchl, we have a common caterpillar subtree on Q(y/\ogn) leaves. If neither of the 
trees T x and T 2 contains a sufficiently large balanced subtree, then we restricted one of 
the trees to a sufficiently large caterpillar. Thus by construction, there is a common 
caterpillar subtree on Q(^\ogn) leaves. We then restrict T 3 to the leaf set of T 12 . Now 



by repeated applications of Proposition 13 Ti j2 and T3 must have a common caterpillar 
subtree, say Ti j2i3 , on ^(log^logn)) ~ fi(loglogn) leaves, and Ti )2i3 and T 4 have a 
common caterpillar subtree on Q (log log log n) leaves, and so on. □ 



4 When both trees are balanced 

We now investigate the size of a maximum agreement subtree of two balanced, binary 



trees. In this case, in Theorem 19 we obtain a much better bound than that of Theorem pBl 



Lemma 16. Suppose T x and T 2 are rooted, balanced, binary trees of height m 1 and m 2 , 
respectively. Suppose that \L(Ti) fl L(T 2 )| = t > 0. Then for all 5 & (0, ^) , the two trees 
have a rooted maximum agreement subtree on at least 2 9 ^ mi ' m2 '^ leaves, where 

(mi +m 2 )log(l -35) +logt 
^ (mi ' m2 ' t} := log (1-35) -log 5 ' 

Proof. Let M(mx, m 2 , t) be the minimum value of mast{Ti, T 2 } (over all choices of T\ and 
T 2 ), where T\ and T 2 are as in the statement of the lemma. Observe that M(m 1; m 2 , t) is a 
monotonically non-decreasing function of t. We show the result by induction on m x +m 2 . 
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Base case: When mi + m 2 G {0, 1}, at least one of the trees has a single vertex (which 
is its leaf and root), hence mast{T 1 ,T 2 } = 1. Also, since t — 1, we have g(mi,m2,t) < 0, 
and the claim is true. So we assume below that mi > 1 and m 2 > 1. 



l 'UV 



Induction step: Let u := p(T x ) and u := p(T 2 ). As in Theorem |8j we have t = t h 
te(u)£(v)+tr(u)r(v)+ti(u)r(v)+t r (u)£(v) and we assume, without loss of generality, that te( u )e{v) + 
t r (u)r(v) > ti{u)r(v) + t r (u)£(v) and i r ( u ) r ( v ) > to obtain 

*r(M)r(«) > rV^l- (13) 

Case 1: t^Ky) > St. 



By ( 13 ), we also have t r i u \ r ( v \ > St. In this case, we take a maximum agreement subtree 



St of T± and Tg ^ and a maximum agreement subtree S r of T^ r ' and Tg ^. We 
then construct S* = Si o S r , which by Proposition [2] is an agreement subtree of T x and T 2 . 
Therefore, we have 

M(m 1 ,m 2 , t) > 2M(m 1 - 1, m 2 - 1, \St] ) 

> 2 x 2 s ( mi ~ 1 >' m2 ~ 1 >r 5 *l) 

> 2( 1 +s , ( rn 'i~ 1 ' rn -2— i, r^*D) 

> 29( m i' m 2,*) 

where the last step follows from 

i i / ! , ,™ (mi -l + m 2 - l)log(l - 3(5) + log(^) 

l + g(m 1 -l,m 2 -l, St\) = 1 + — — 

log(l — 3d) — logd) 

(m x + m 2 ) log(l — 35) + logt — log(l — 3<5) 

log(l -35) - log 5 

> g(m 1 ,m 2 ,t), 
where the last inequality requires that S G (0, |). 

Case 2: t^ u )r(v) > $t and t r («)£(«) 

The calculation in this case is identical to that of Case 1, so we omit it. 

Case 3: ^(„) r („) < St and £ r ( u )^) < (5t 

Since Case 1 has been examined, we assume that t(,i u )t(v) < St, which implies t r ( u ) r («) > 
(1— 3<5)t. Since mast{Ti, T 2 } must be at least mast{Ti ,T 2 }, we have M(mi, m 2 , t) > 
M{m\ — 1, m 2 — 1, [(1 — 3<5)t)] . Now the result follows from the assumption that <5 G (0, ^) 
and the following: 

<?(mi-l,m 2 -l, \{l-36)t]) 

(mi - 1 + m 2 - 1) log(l - 35) + log(l - 3gt 

log(l-3<5) -log 5 
(m x + m 2 ) log(l — 35) + logt — log(l — 3(5) 
log(l -3(5) - log 5 

> g(mi,m 2 ,t). 
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Case 4-' ti(u)r{v) < St and t r ( u )e(v) > St. 

Since Case 1 has been examined, we assume that tt(u)e(v) < St, which implies t r (ti)*(iO + 
t r (u)r(v) > (1 — 2S)t. In this case, since mast{Ti, T 2 } must be at least mast{Ti , T 2 }, 
we can write M(m 1; m 2 , £) > M{mi — 1, m 2 , [(1 — 2<5)i~|). Now the result follows from the 
assumption that S G (0, |) and the following: 

0(mi-l,m 2 , r(l — 25)^1) 

(mi - 1 + m 2 ) log(l - 3(5) + log(l - 2<5)t 

log(l -35) -log 5 
(mi + m 2 ) log(l - 35) + log* + log(l - 2(5) - log(l - 35) 
log(l-3(5) - log <5 

> g(m 1 ,m 2 ,t). 



Case 5: t^ u)r ^ v) > St and t r{u)e{v ) < St. 

The analysis of this case is similar to Case 4, except that we have the inequality 
M(m 1 ,m 2 ,t) > M(m l ,m 2 - 1, [(1 - 2S)t\). □ 



Corollary 17. Lei 



- , „ 1 M , n ( l + 21og(l-3<y) 
5 E 0, -= and (3 := ' 



3 3^7 VMl-3^)-log5 

JfXi andT 2 are rooted, balanced, binary trees on the same leaf-set of cardinality 2 m , then 
T\ and T 2 have a maximum agreement subtree on at least 2 l3m leaves. 



Proof. We set mi = m 2 = m and t = 2 m in Lemma 16 Moreover, we now require S to 



be less that f| — -^^j (which is less that 1/4) so as to ensure that (3 is positive. □ 

As in Section [2j we present algorithm Match2 that closely follows the recursions in the 
roof of Lemma [T6| It takes as input two rooted, balanced, binary trees, and returns a set 
of leaves in a common subtree. Algorithm Match2 depends on a real positive S, which we 
require to be sufficiently small for the algorithm to give a desired bound on the size of a 
common subtree. The algorithm is somewhat greedy and suboptimal. The analysis of the 
performance of Match2 makes Lemma [To] much more transparent, giving an alternative 



proof of Corollary 17 We then apply the corollary to prove the main results of this section 



for unrooted, balanced (or "almost balanced") trees. 
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Algorithm Match2(T.[ tt) , T 2 W ) 

1: if \L(Ti u) )\ = 1 or \L(T} v) )\ = 1 then 
return L(t} u) ) f] L(t} v) ) . 

2: if necessary, interchange left and right subtrees in T[ u) and/or T 2 M so that 

tl(u)r{v) + t r (u)£(v) < tl{u)t(v) + £r(u)r(tj) and tl(u)l(v) — t r (u)r(v) • 

3: if (^(tt)f(t;) > <W™ and £ r ( u ) r („) > then 

return Match2(T 1 W " )) , T 2 W,;)) ) U Match2(T 1 (r(n)) , t!? {v)) ) . 

4: if (te(u)r(v) > and £ r («)^(„) > 5t uv ) then 

return Match2(lf u)) , T 2 (rM) ) U Match2(T 1 (rM) , T 2 (€(u)) ) . 

5: if (ti( u )r(v) < St uv and £ r ( u )<( v ) < then 
return Match2(T 1 {r(u)) , T 2 (r(,;)) ) . 

6: if (t^( u ) r ( J )) < 5i w and t r (u)e(y) > then 
return Match2(T 1 {r(u)) , T 2 W ) . 

7: if (^(«)r(«) > 8t uv and i r ( u )i(„) < 5i u1J ) then 
return Match2(T 1 W , T^ v)) ) . 



We now analyze the above algorithm to compute (3 in Corollary 17 more transparently. 



Alternative proof of Corollary 17 We prove the result by analyzing Match2. In the be- 



ginning, we call Mat ch2 (T^ , T 2 W ), where x := p(T{) and y := p(T 2 ). Let T be the tree 
of recursive calls to Match2 constructed as follows: the pair (x,y) is the root of T. If 
Match2(T 1 ^"' 1 , T 2 ) is called during the execution of the algorithm, then (it, v) is a vertex 
of T. If Match2(T 1 (u) ,T 2 (,;) ) calls Match2(T 1 (M ' ) , T 2 (l/) ), then (u>') is a child of (u,v). The 
leaf vertices of T correspond to the function calls that return in line [T] Observe that 
in line [TJ a set containing a single new leaf is returned. By construction, the number of 
leaves in the common subtree returned by Match2 is precisely the number of leaves of T. 

The ideas in this lemma are similar to those in the alternative proof of Corollary |4j 
We consider an arbitrary root-to-leaf path in T and we show that it branches at least (3m 
times. We then conclude that T has at least 2^ m leaves, thereby proving the theorem. 

Now consider an arbitrary root-to-leaf path (xq, yo), (x\, yi), . . . , (xk, yk) in T, where 
(xo,yo) '■= (x,y). As a shorthand notation, define 

*i ■= t XiVi . 

Now suppose Match2(T 1 (u) ,T 2 (,;) ) is being called with u = X{ and v = y^ Using our 
notation, we have ti = t uv . For each possibility of calling Match2 recursively, we obtain a 
lower bound for t i+ i in terms of £j. 



16 



First observe that, as in the case of Matchl, we relabel £(u), r(u) and £(v),r(v) so that 
we have t^ u y^ v ) + t r (u) r ( v ) > t uv /2 and t r (u) r ( v ) > tt(u)e(v) (which implies t r (u) r (v) > t uv /A). 
Hence, for the choice of 5, we have 

tr(u)r(v) ^ ^ 6t uv . (14) 

If a recursive call in line [3] is triggered, then (xj+i, j/i+i) is either (£(u), £(f )) or (r(it), r(f )). 
In both cases, under the conditions in line |3j we have 

U+i > SU. (15) 



Similarly, if a recursive call in line [4j is triggered, then we also have (15). 

After lines [3] and [41 we must have t^i^) < 5t uv because ti(u)i( v ) — tr(u)r(v) an d the 
condition in line [3] has failed. 

If the recursive call in line [5] is triggered, then (x i+1 ,y i+ i) = (r(u),r(v)), and 

t i+ i > (1 - 36)*, (16) 

because t^iiv) < $tuv, ti(u)r{v) < ^™ and £ r („)^( v ) < 

If the recursive call in line [6] is triggered, then (xj + i,t/ i+ i) = (r(u),v), and 

U +1 > (1 - 25)*i, (17) 

because t^i^) < 5t uv and ^( u ) r ( v ) < 

Similarly, if the recursive call in line [7] is triggered, then (xj+i, ?/i + i) = (u,r(v)), 
and (17) holds because tg^^ < 5t uv and i r (u)^Cu) < 

Along the chosen path in T, suppose that line [3] is executed a times, line [4] is executed 
b times, line [5] is executed c times, line [6] is executed d times, and line [7] is executed e times. 
In each of the recursive calls, the height of one of the trees decreases by 1. Therefore, 
a + b + c + d + e< 2m. Consequently, we have 

tuv > t 5 a+b (l-35) c (l-25) d+e 

> 2 m 5 a+b (l - 35) c+d+e 

> 2 m S a+b (l - 3S) 2m ^ b 



5 xa+b 



1-35 



1-36) 



2m 



Since (xk,yk) is a leaf of T, we must have tk = 1. Hence the right-hand-side of (18) must 
be less than 1, which implies, for the choice of 0, that a + b > j3m. Now, a positive 5 less 
than | — guarantees that (3 is positive. 

We have shown that each root-to-leaf path in T branches at least a + b > (5m times, 
which further implies that there must be at least 2 l3m leaves in T. Hence T\ and T2 must 
have a common subtree on at least 2^ m leaves. □ 
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Remark 18. Observe that in the above analysis, we showed that each root-to-leaf path in 
T has length at least /3m, which implies that it is possible to find an agreement subtree 
with at least 2 /3m leaves that is also balanced (and of height at least /3m). One of the 
implications of this observation, which we state without proof, is that Lemma [T6] together 
with algorithm Match2 may be used to obtain a lower bound of 2 7m for an agreement 
subtree of more than 2 balanced binary trees of height m for a sufficiently small positive 
7. For example, we call algorithm Match2 for two rooted, balanced trees T\ and T 2 , 
obtaining a rooted, balanced agreement subtree, say T12, and then call algorithm Match2 
for Tn and T 3 , obtaining a rooted, balanced agreement subtree T123, and so on. 

Theorem 19. There exists a constant c > such that, ifT\ and T2 are balanced, binary 
trees on the same leaf-set, both in B m or both in C m , then they have a maximum agreement 
subtree on at least 2 /3m ~ c leaves. 

Proof. As in Theorem |HJ we consider the two cases: the trees are either both in class B m 
or both in class C m . When the trees are both in class B m , the proof is analogous to the 
corresponding case in Theorem |8j except that it invokes Corollary 17 instead of Lemma [3] 



When the trees are in class C m , the analysis differs only slightly from that in Theorem [8] 
We delete one of the branches of Ti rooted at the center, and root the resulting tree at 
the center (which now has degree 2). Let X be the leaf-set of the pruned tree. We cannot 
simply take a restriction of T2 to the leaf-set X as in Theorem |8j since T2IX may not be 
a balanced tree. We instead delete one of the branches of T 2 rooted at its center, and 
root the pruned tree at its center. Let Y be the leaf-set of the pruned tree. Now we can 
ensure that \X D Y\ > 2 m+1 /3 by appropriately choosing the branches of T\ and T2 to be 
deleted. We apply Match2 to the rooted trees T\\X and T 2 |Y\ The analysis of Match2 
does not change, except that we now have a constant factor 2/3 on the right-hand-side 
of (18). Therefore, with a,b,c,d,e defined as in the alternative proof of Corollary 17 we 
have 

f l+21og(l-3,5)-(l/m)log(3/2) ^ 
tt + 6> 'H log(l-3i)-log« )■ 

Taking c = (log 3 — l)/(log(l — 35) — log 5), there are at least 2 l3m ~ c leaves in a common 
subtree. □ 

In fact, we have a similar result when the two trees are "almost balanced". 

Proposition 20. For every k > 0, there is a constant (5k > such that, ifT\ and T2 are 
binary trees on a leaf-set of cardinality n, each of radius at most k log n, then they have a 
maximum agreement subtree on at least n^ k leaves. 



Proof. The proof is analogous to the alternative proof of Corollary [TTj the only change is 

+ 

ft 



that now we have a + b + c + d + e<2k log n. We use a value of S > such that 

l + 2Hog(l-35) 



log(l-3<J)-]og<S 

is positive, and we have a + b > f3k logm □ 
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Concerning the maximum agreement subtree problem for balanced trees, we believe 
in the following. 

Conjecture 21. Any two balanced, rooted, binary trees of height m have an agreement 
subtree on at least 2 m l 2 leaves. 

We now describe an example of a pair of rooted, balanced, binary trees of height 2k, 
for each k > 0, which we believe is an extremal example. Let Ti and T 2 be balanced, 
binary trees of height 2k, rooted at u and v, respectively, and both drawn top-down. Let 
the leaves of Ti be labelled 1, 2, • • • , 2 2k from left to right. We label the leaves of T 2 from 
left to right according to the sequence swap(l, 2, • • • , 2 2k ), which we define recursively as 
follows: 

1. If S is a sequence of length 1, then 

swap(S') = S. 

2. If S is a sequence of length A 1 , with i > 0, written as S :— Si : S 2 : S3 : S4 as a 
concatenation of 4 sequences of length 4*" 1 each, then 

swap(S') := swap(Si) : swap(S , 3 ) : swap(S , 2 ) : swap(S , 4). 

Proposition 22. Trees Ti and T 2 have no rooted agreement subtree with more than 2 k 
leaves. 

Proof. We prove the result by induction on k. When k — 1, the trees have 4 leaves, with 
the leaves of Ti labelled 1, 2, 3, 4 from left to right, while the leaves of T 2 labelled 1, 3, 
2, 4 from left to right. In this rooted agreement subtree cannot have more than 2 

leaves. 

In the general case, the inductive argument goes as follows. Let u\ and w 2 be the 
children of i(u), and let M3 and M4 be the children of r{u). Similarly, in T 2 , we label 
the grandchildren of v by fi,f 2 ,f3,f4. By construction, we have L(T[ Ul ^) = L(T 2 ? ' 1 ^), 
L(Tf 2) ) = T(Tf 3) ), L(Tf 3) ) = T(Tf 2) ), and L(Tf 4) ) = T(T 2 M ). But a rooted bal- 
anced agreement subtree cannot have leaves from more than two of the sets L(T^),i e 
{1, 2, 3, 4}. Therefore, mast{Ti, T 2 } < 2 x mastjTf , T 2 K) } < 2 x 2 fe " 1 = 2 k . □ 

An analogous but slightly weaker statement holds for unrooted trees. We use the same 
labelling scheme as in the rooted case, but remove the roots, i.e., we delete the vertex u, 
and the edges {u,£(u)} and {u,r(u)}, and add an edge {£(u),r(u)}, and similarly make 
T 2 unrooted. 

Proposition 23. Trees Ti and T 2 have no agreement subtree with more than 3 x 2 k ~ 1 
leaves. 
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Proof. We prove the result by induction on k. When k = 1, the trees have an agreement 
subtree on 3 leaves, but not 4. In the general case, as in the rooted case, L(T^) = 
L(T 2 (vi) ), L(T[ U2) ) = L(T 2 {va) ), L(T[ U3) ) = L(T 2 (v2) ), and L(lf 4) ) = L(T 2 M ). For i G 
{1, 2, 3, 4}, let Ai denote the set of leaves from L(T^) that are in a maximum agreement 
subtree R. But an agreement subtree cannot have leaves from all four sets L(T^),i G 
{1, 2, 3, 4}. Therefore, A t = for some i G {1, 2, 3, 4}. 

Case 1: Two or three of the four sets Ai are non-empty. Without loss of generality, let A\ 
and A 2 (and possibly also A 3 ) be non-empty. For i G {1,2,3}, let Xi be the most recent 
common ancestor of leaves in Ai in the rooted subtree T±. Now we observe that R\A\ 
rooted at X\ is a rooted agreement subtree of and T± , and R\A 2 rooted at x 2 is a 
rooted agreement subtree of and T^ V3 \ and, if A 3 is non-empty, i?|A 3 rooted at x 3 is a 
rooted agreement subtree of andT^ 2) . Therefore, \L(R)\ = \A 1 UA 2 UA 3 \ < 3x2 k ~ 1 



(by Proposition 22). 



Case 2: A maximum agreement subtree R has leaves from only one of the four sets 
L(T[ Ui) ),i G {1,2,3,4}. Without loss of generality, let L(R) C L(T[ Ul) ). In this case, by 
induction, mast{Ti,T 2 } < 3 x 2 k ~ 2 < 3 x 2 k ~\ □ 



We end by presenting another proof of the rooted case with a different labelling scheme. 
Since T\ and T 2 have height 2k, each node at depth k in these trees is the root of a balanced 
tree of height k (they are in the middle level). Let the nodes at depth k in 7\ be called 
Ui, u 2 , . . . , u 2 k. Similarly, let V\,v 2 , . . . ,v 2 k be the nodes at depth k in T 2 . Consider a 
2 k x 2 k matrix A with its entries being precisely the elements of {1, 2, • • • , 2 2k } without 
repetition, placed arbitrarily. For each i £ {1,2,..., 2 k } } let the leaves of be labelled 
with entries in the i-ih row of A. Similarly, let the leaves of T 2 be labelled with entries 
in the j-th column of A. Hence, we have 

\L(Ti Ui) ) n L(T 2 {Vj) )\ = 1 (19) 

for all i,j G {1, 2, . . . , 2 k }. 



Alternative proof of Proposition 22, Assume, for the purpose of contradiction, that there 
exists S ^ Ti,T 2 with more than 2 k leaves. Let f\ and f 2 be maps that realize S ^ T\ 
and S ^T 2 , respectively. Note that 

depth 5 (a;) < depth T .(/i(x)) (20) 

for any x G V(S), and i — 1,2. By the assumption on S, there must be an internal node 
w of S of depth at least k. Hence, by (20), we have fi{w) G V{T^) for some i, and 



f 2 (w) G ViT^ 1 ^) for some j. But S w has at least two leaves, which must be simultaneously 



in L{t[ u ^) and L(T 2 V ^). This contradicts (19). The proposition follows. □ 
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